Search Result

Select

News named entity recognition and sentiment classification based on attention-based bi-directional long short-term memory neural network and conditional random field

HU Tiantian, DAN Yabo, HU Jie, LI Xiang, LI Shaobo

Journal of Computer Applications 2020, 40 (7): 1879-1883. DOI: 10.11772/j.issn.1001-9081.2019111965

Abstract （971）

PDF （864KB）（948）

Save

Attention-based Bi-directional Long Short-Term Memory neural network and Conditional Random Field (AttBi-LSTM-CRF) model was proposed for the corpus core entity recognition and core entity sentiment analysis task of Sohu coreEntityEmotion_train. Firstly, the text was pre-trained, each word was mapped into a low-dimensional vector with the same dimension. Then, these vectors were input into the Attention-based Bi-directional Long Short-Term Memory neural network (AttBi-LSTM) to obtain the long-term context information and focus on the information highly related to the output label. Finally, the optimal label of the entire sequence was obtained through the Conditional Random Field ( CRF) layer. The comparison experiments were conducted among AttBi-LSTM-CRF model, Bi-directional Long Short-Term Memory neural network (Bi-LSTM), AttBi-LSTM and Bi-directional Long Short-Term Memory neural network and Conditional Random Field (Bi-LSTM-CRF) model. The experimental results show that, the accuracy of AttBi-LSTM-CRF model is 0.78, the recall is 0.667, and the F1 value is 0.553, which are better than those of the comparison models. The superiority of AttBi-LSTM-CRF performance is verified.

Reference | Related Articles | Metrics

Select

Storage location assignment optimization of stereoscopic warehouse based on genetic simulated annealing algorithm

ZHU Jie, ZHANG Wenyi, XUE Fei

Journal of Computer Applications 2020, 40 (1): 284-291. DOI: 10.11772/j.issn.1001-9081.2019061035

Abstract （592）

PDF （1063KB）（409）

Save

Concerning the problem of storage location assignment in automated warehouse, combined with the operational characteristics and security requirements of warehouse, a multi-objective model for automated stereoscopic warehouse storage location assignment was constructed, and an adaptive improved Simulated Annealing Genetic Algorithm (SAGA) based on Sigmoid curve for solving the model was proposed. Firstly, aiming at reducing the loading and unloading time of items, the distance between items in same group and the gravity center of shelf, a storage location optimization model was established. Then, in order to overcome the shortcomings of poor local search ability and being easily fall into local optimum of Genetic Algorithm (GA), the adaptive cross mutation operation based on Sigmoid curve and the reversed operation were introduced and fused with SAGA. Finally, the optimization, stability and convergence of the improved genetic SAGA were tested. The experimental results show that compared with Simulated Annealing (SA) algorithm, the proposed algorithm has the optimization degree of loading and unloading time of items increased by 37.7949 percentage points, the optimization degree of distance between items in same group improved by 58.4630 percentage points, and optimization degree of gravity center of shelf increased by 25.9275 percentage points, meanwhile the algorithm has better stability and convergence. It proves the effectiveness of the improved genetic SAGA to solve the problem. The algorithm can provide a decision method for automated warehouse storage location assignment optimization.

Reference | Related Articles | Metrics

Select

Attention mechanism based pedestrian trajectory prediction generation model

SUN Yasheng, JIANG Qi, HU Jie, QI Jin, PENG Yinghong

Journal of Computer Applications 2019, 39 (3): 668-674. DOI: 10.11772/j.issn.1001-9081.2018081645

Abstract （2753）

PDF （1160KB）（1340）

Save

Aiming at that Long Short Term Memory (LSTM) has only one pedestrian considered in isolation and cannot realize prediction with various possibilities, an attention mechanism based generative model for pedestrian trajectory prediction called AttenGAN was proposed to construct pedestrian interaction model and predict multiple reasonable possibilities. The proposed model was composed of a generator and a discriminator. The generator predicted multiple possible future trajectories according to pedestrian's past trajectory probability while the discriminator determined whether the trajectories were really existed or generated by the discriminator and gave feedback to the generator, making predicted trajectories obtained conform social norm more. The generator consisted of an encoder and a decoder. With other pedestrians information obtained by the attention mechanism as input, the encoder encoded the trajectories of the pedestrian as an implicit state. Combined with Gaussian noise, the implicit state of LSTM in the encoder was used to initialize the implicit state of LSTM in the decoder and the decoder decoded it into future trajectory prediction. The experiments on ETH and UCY datasets show that AttenGAN can provide multiple reasonable trajectory predictions and can predict the trajectory with higher accuracy compared with Linear, LSTM, S-LSTM (Social LSTM) and S-GAN (Social Generative Adversarial Network) models, especially in scenes of dense pedestrian interaction. Visualization of predicted trajectories obtained by the generator indicated the ability of this model to capture the interaction pattern of pedestrians and jointly predict multiple reasonable possibilities.

Reference | Related Articles | Metrics

Select

Scheduling strategy of cloud robots based on parallel reinforcement learning

SHA Zongxuan, XUE Fei, ZHU Jie

Journal of Computer Applications 2019, 39 (2): 501-508. DOI: 10.11772/j.issn.1001-9081.2018061406

Abstract （412）

PDF （1403KB）（330）

Save

In order to solve the problem of slow convergence speed of reinforcement learning tasks with large state space, a priority-based parallel reinforcement learning task scheduling strategy was proposed. Firstly, the convergence of Q-learning in asynchronous parallel computing mode was proved. Secondly, complex problems were divided according to state spaces, then sub-problems and computing nodes were matched at the scheduling center, and each computing node completed the reinforcement learning tasks of sub-problems and gave feedback to the center to realize parallel reinforcement learning in the computer cluster. Finally, the experimental environment was built based on CloudSim, the parameters such as optimal step length, discount rate and sub-problem size were solved and the performance of the proposed strategy with different computing nodes was proved by solving practical problems. With 64 computing nodes, compared with round-robin scheduling and random scheduling, the efficiency of the proposed strategy was improved by 61% and 86% respectively. Experimental results show that the proposed scheduling strategy can effectively speed up the convergence under parallel computing, and it takes about 1.6×10 ⁵ s to get the optimal strategy for the control probelm with 1 million state space.

Reference | Related Articles | Metrics

Select

Measure method and properties of weighted hypernetwork

LIU Shengjiu, LI Tianrui, YANG Zonglin, ZHU Jie

Journal of Computer Applications 2019, 39 (11): 3107-3113. DOI: 10.11772/j.issn.1001-9081.2019050806

Abstract （483）

PDF （913KB）（362）

Save

Hypernetwork is a kind of networks which is more complex than the ordinary complex network. Hypernetwork can describe complex system existing in the real world more appropriately than complex network since every hyperedge of it can connect any number of nodes. A new method to measure hypernetwork-Hypernetwork Dimension (HD) was proposed aiming to the shortcomings and deficiencies of existing measure method of hypernetwork. Hypernetwork dimension was expressed as twice as much as the ratio of the logarithm of the sum of all nodes' weights and product of corresponding hyperedge's weight in all hyperedges to the logarithm of the product of sum of hyperedges' weights and sum of nodes' weights. The hypernetwork dimension was able to be applied to the weighted hyperworks with many different numerical types of both nodes' weights and hyperedges' weights, such as positive real numbers, negative real numbers, pure imaginary numbers, and even complex numbers. Finally, several important properties of the proposed hypernetwork dimension were discussed.

Reference | Related Articles | Metrics

Select

Multi-center convolutional feature weighting based image retrieval

ZHU Jie, ZHANG Junsan, WU Shufang, DONG Yukun, LYU Lin

Journal of Computer Applications 2018, 38 (10): 2778-2781. DOI: 10.11772/j.issn.1001-9081.2018041100

Abstract （395）

PDF （674KB）（396）

Save

Deep convolutional features can provide rich semantic information for image content description. In order to highlight the object content in the image representation, the multi-center convolutional feature weighting method was proposed based on the relationship between high response positions and object regions. Firstly, the pre-trained deep network model was used to extract the deep convolutional features. Secondly, the activation map was obtained by summing the feature maps in all the channels and the positions with top few highest responses were considered as the centers of the object. Thirdly, the number of the centers was considered as the scale, and the descriptors corresponding to different positions were weighted based on the distances between these centers and the positions. Finally, the image representation for image retrieval was generated by merging the image features obtained based on different numbers of centers. Compared with Sum-pooled Convolutional (SPoC) algorithm and Cross-dimensional (CroW) algorithm, the proposed method can provide scale information and highlight the object content in the image representation, and achieves excellent retrieval results in the Holiday, Oxford and Paris image retrieval datasets.

Reference | Related Articles | Metrics

Select

Bayesian clustering algorithm for categorical data

ZHU Jie, CHEN Lifei

Journal of Computer Applications 2017, 37 (4): 1026-1031. DOI: 10.11772/j.issn.1001-9081.2017.04.1026

Abstract （638）

PDF （919KB）（504）

Save

To address the difficulty of defining a meaningful distance measure for categorical data clustering, a new categorical data clustering algorithm was proposed based on Bayesian probability estimation. Firstly, a probability model with automatic attribute-weighting was proposed, in which each categorical attribute is assigned an individual weight to indicate its importance for clustering. Secondly, a clustering objective function was derived using maximum likelihood estimation and Bayesian transformation, then a partitioning algorithm was proposed to optimize the objective function which groups data according to the weighted likelihood between objects and clusters instead of the pairwise distances. Thirdly, an expression for estimating the attribute weights was derived, indicating that the weight should be inversely proportional to the entropy of category distribution. The experiments were conducted on some real datasets and a synthetic dataset. The results show that the proposed algorithm yields higher clustering accuracy than the existing distance-based algorithms, achieving 5%-48% improvements on the Bioinformatics data with meaningful attribute-weighting results for the categorical attributes.

Reference | Related Articles | Metrics

Select

Color based compact hierarchical image representation

ZHU Jie, WU Shufang, XIE Bojun, MA Liyan

Journal of Computer Applications 2017, 37 (11): 3238-3243. DOI: 10.11772/j.issn.1001-9081.2017.11.3238

Abstract （398）

PDF （1047KB）（417）

Save

The spatial pyramid matching method provides the spatial information by splitting an image into different cells. However, spatial pyramid matching can not match different parts of the objects well. A hierarchical image representation method based on Color Level (CL) was proposed. The class-specific discriminative colors of different levels were obtained from the viewpoint of feature fusion in CL algorithm, and then an image was iteratively split into different levels based on these discriminative colors. Finally, image representation was constructed by concatenating the histograms of different levels. To reduce the dimensionality of image representation, the Divisive Information-Theoretic feature Clustering (DITC) method was used to cluster the dictionary, and the generated compact dictionary was used for final image representation. Classification results on Soccer, Flower 17 and Flower 102 datasets, demonstrate that the proposed method can obtain satisfactory results in these datasets.

Reference | Related Articles | Metrics

Select

Chinese natural language interface based on paraphrasing

ZHANG Junchi, HU Jie, LIU Mengchi

Journal of Computer Applications 2016, 36 (5): 1290-1295. DOI: 10.11772/j.issn.1001-9081.2016.05.1290

Abstract （521）

PDF （1117KB）（419）

Save

In this paper, a novel method for Chinese Natural Language Interface of Database (NLIDB) based on Chinese paraphrase was proposed to solve the problems of traditional methods based on syntactic parsing which cannot obtain high accuracy and need a lot of manual label training corpus. First, key entities of user statements in databases were extracted, and candidate tree sets and their tree expressions were generated. Then most relevant semantic expressions were filtered by paraphrase classifier which was obtained from the Internet Q&A training corpus. Finally, candidate trees were translated into Structured Query Language (SQL). F1 score was respectively 83.4% and 90% on data sets of Chinese America Geography (GeoQueries880) and Questions about Restaurants (RestQueries250) by using the proposed method, better than syntactic based method. The experimental results demonstrate that the NLIDB based on paraphrase can handle the semantic gaps between users and databases better.

Reference | Related Articles | Metrics

Select

Hadoop adaptive task scheduling algorithm based on computation capacity difference between node sets

ZHU Jie, LI Wenrui, WANG Jiangping, ZHAO Hong

Journal of Computer Applications 2016, 36 (4): 918-922. DOI: 10.11772/j.issn.1001-9081.2016.04.0918

Abstract （505）

PDF （783KB）（460）

Save

Aiming at the problems of the fixed task progress proportions and passive selection of slow tasks in the task speculation execution algorithm for heterogeneous cluster, an adaptive task scheduling algorithm based on the computation capacity difference between node sets was proposed. The computation capacity difference between node sets was quantified to schedule tasks by fast and slow node sets, and dynamic feedback of nodes and tasks speed were calculated to update slow node sets timely to improve the resource utilization rate and task parallelism. Within two node sets, task progress proportions were adjusted dynamically to improve the accuracy of slow tasks identification, and the fast node which executed backup tasks dynamically for slow tasks by substitute execution implementation was selected to improve the task execution efficiency. The experimental results showed that, compared with the Longest Approximate Time to End (LATE) algorithm, the proposed algorithm reduced the running time by 5.21%, 20.51% and 23.86% respectively in short job set, mixed-type job set and mixed-type job set with node performance degradation, and reduced the number of initiated backup tasks significantly. The proposed algorithm can make the task adapt to the node difference, and improves the overall job execution efficiency effectively with reducing slow backup tasks.

Reference | Related Articles | Metrics

Select

Fuzzy clustering algorithm based on midpoint density function

ZHOU Yueyue, HU Jie, SU Tao

Journal of Computer Applications 2016, 36 (1): 150-153. DOI: 10.11772/j.issn.1001-9081.2016.01.0150

Abstract （460）

PDF （755KB）（357）

Save

In the traditional Fuzzy C-Means (FCM) clustering algorithm, the initial clustering center is uncertain and the number of clusters should be preset in advance which may lead to inaccurate results. The fuzzy clustering algorithm based on midpoint density function was put forward. Firstly, the stepwise regression thought was integrated as the initial clustering center selection method to avoid convergence from local circulation, and then the number of clusters was determined, finally according to the results, the validity index of fuzzy clustering including overlap degree and resolution was judged to determin the optimal number of clusters. The results prove that, compared with the traditional improved FCM, the proposed algorithm reduces the number of iterations and increases the average accuracy by 12%. The experimental results show that the proposed algorithm can reduce the processing time of clustering, and it is better than the comparison algorithm on the average accuracy and the clustering performance index.

Reference | Related Articles | Metrics

Select

Resource matching maximum set job scheduling algorithm under Hadoop

ZHU Jie, LI Wenrui, ZHAO Hong, LI Ying

Journal of Computer Applications 2015, 35 (12): 3383-3386. DOI: 10.11772/j.issn.1001-9081.2015.12.3383

Abstract （613）

PDF （725KB）（332）

Save

Concerning the problem that jobs of high proportion of resources execute inefficiently in job scheduling algorithms of the present hierarchical queues structure, the resource matching maximum set algorithm was proposed. The proposed algorithm analysed job characteristics, introduced the percentage of completion, waiting time, priority and rescheduling times as urgent value factors. Jobs with high proportion of resources or long waiting time were preferentially considered to improve jobs fairness. Under the condition of limited amount of available resources, the double queues was applied to preferentially select jobs with high urgent values, select the maximum job set from job sets with different proportion of resources in order to achieve scheduling balance. Compared with the Max-min fairness algorithm, it is shown that the proposed algorithm can decrease average waiting time and improve resource utilization. The experimental results show that by using the proposed algorithm, the running time of the same type job set which consisted of jobs of different proportion of resources is reduced by 18.73%, and the running time of jobs of high proportion of resources is reduced by 27.26%; the corresponding percentages of reduction of the running time of the mixed-type job set are 22.36% and 30.28%. The results indicate that the proposed algorithm can effectively reduce the waiting time of jobs of high proportion of resources and improve the overall jobs execution efficiency.

Reference | Related Articles | Metrics

Select

Feature selection method based on integration of mutual information and fuzzy C-means clustering

ZHU Jiewen XIAO Jun

Journal of Computer Applications 2014, 34 (9): 2608-2611.

Abstract （219）

PDF （774KB）（400）

Save

Plenty of redundant features may reduce the performance of data classification in massive dataset, so a new method of automatic feature selection based on the integration of Mutual Information and Fuzzy C-Means (FCM) clustering, named FCC-MI, was proposed to resolve this problem. Firstly, MI and its correlation function were analyzed, then the features were sorted according to the correlation value. Secondly, the data was grouped according to the feature with the maximum correlation, and the number of the optimal features were determined automatically by FCM clustering method. At last, the optimization selection of the features was performed using correlation value. Experiments on seven datasets of UCI machine learning database were conducted to compare FCC-MI with three methods come from the literatures, including WCMFS (Within class variance and Correlation Measure Feature Selection), B-AMBDMI (Based on Approximating Markov Blank and Dynamic Mutual Information), and T-MI-GA (Two-stage feature selection algorithm based on MI and GA). The theoretical analysis and experimental results show that the proposed method not only improves the efficiency of data classification, but also ensures the classification accuracy and automatically determine the optimal feature subset, which reduces the number of the features of the dataset, thus it is suitable for feature reduction and analysis of mass data with large correlation features.

Reference | Related Articles | Metrics

Select

Three-queue job scheduling algorithm based on Hadoop

ZHU Jie ZHAO Hong LI Wenrui

Journal of Computer Applications 2014, 34 (11): 3227-3230. DOI: 10.11772/j.issn.1001-9081.2014.11.3227

Abstract （184）

PDF （756KB）（524）

Save

Single queue job scheduling algorithm in homogeneous Hadoop cluster causes short jobs waiting and low utilization rate of resources; multi-queue scheduling algorithms solve problems of unfairness and low execution efficiency, but most of them need setting parameters manually, occupy resources each other and are more complex. In order to resolve these problems, a kind of three-queue scheduling algorithm was proposed. The algorithm used job classifications, dynamic priority adjustment, shared resource pool and job preemption to realize fairness, simplify the scheduling flow of normal jobs and improve concurrency. Comparison experiments with First In First Out (FIFO) algorithm were given under three kinds of situations, including that the percentage of short jobs is high, the percentages of all types of jobs are similar, and the general jobs are major with occasional long and short jobs. The proposed algorithm reduced the running time of jobs. The experimental results show that the execution efficiency increase of the proposed algorithm is not obvious when the major jobs are short ones; however, when the assignments of all types of jobs are balanced, the performance is remarkable. This is consistent with the algorithm design rules: prioritizing the short jobs, simplifying the scheduling flow of normal jobs and considering the long jobs, which improves the scheduling performance.

Reference | Related Articles | Metrics

Select

New disparity estimation method for multiview video based on Mean Shift

HU Bo DAI Wanchang XIAO Zhijian WU Jianping HU Jie

Journal of Computer Applications 2013, 33 (08): 2297-2299.

Abstract （573）

PDF （620KB）（403）

Save

Concerning the high computation complexity of disparity estimation for multiview video encoding, a new method for disparity estimation based on Mean Shift was proposed. The relationship between disparity vector and motion vector in the spatio-temporal domain was analyzed, and the prediction disparity vector was calculated. The initial searching position for disparity matching was confirmed to be the initial value of iteration calculation by Mean Shift, and the macroblock's optimum matching in reference frame was achieved. The experimental results show that the proposed method can save more than 94% encoding time with a negligible drop in rate distortion compared with the full search algorithm. Compared to the fast searching algorithm, this method saves more than 10% encoding time and improves rate distortion.